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Title 



Characterisation, of dycans 



field of the Invention 

5 This invention relates to a method for characterising glycans and their 

derivatives, also known as oligosaccharides! In particular, the invention is a method for 
identifying glycan structures fay correlating experimentally determined mass 
spectrometer fragment data with data held in a database of glycan fragments. 
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10 Background of the Invention 

It is known that it is possible to identify glycan structures by correlating 

experimentally determined mass spectrometer fragment data with data held in a 

database of glycan fragments using manual interpretive methods. These methods 

involve researchers comparing the spectrometer fragment data with known fragment 

mass data. The problem with such manual methods i$ that they are slow and time 
consuming. 

These problems essentially arise from the complexity of the glycan structures. 
The following list explains about glycans, with reference to Figs- 1 and 2 and defines 
words that will be used in the remainder of the specification; 

Structure - An oligosaccharide 1 consisting of monosaccharides 2 connected by 
glycosidic bonds 3. A set of independent monosaccharides can be arranged as a linked 
oligosaccharide through tiie glycosidic bonds between the monosaccharides. For the 
purposes of scoring oligosaccharide matches, the oligosaccharide is defined as having a 
direction - that is a particular monosaccharide is defined as a reducing end 
monosaccharide to which all other monosaccharides are either directly or indirectly 
attached. Bach different arrangement of monosaccharides is an isoform of the 
oligosaccharide. The maxiiniTm number of isoforms for an m monosaccharide 
oligosaccharide is given by the forjnula m m ~ l . This number is larger than the actual 
number of isoforms that may be found in nature, as limitations on the numbers of 
children a monosaccharide can have exist. Also, monosaccharides may share the same 
mass, and so not all isoforms would be unique. 

Reducing gfltf — the end of the structure r that is not involved in glycosidic 
linkage. 

Non-wrerinftmp; *nA _ any end of the structure 4 that is not the reducing end of the 
35 structure. 
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Edges - Are located between the monosaccharides, Ej a B2. B3 and B4 are the 
edges for the structure shown in Fig. 1, 

Depth - Is the distance of the edges from the reducing end, so in Fig. 1: 
depth (Ei )< depth (B2 > < depth (E 3 ) • depth (E4 ), and 
Ex has the highest rank. 

Cleavage — A carbon bond in the structure is broken. Cleavage may be: 
glycosidic, cross-ring and special. Fig. 2 shows cleavage at E3 and E4. 

Glycosidic cleavage - A cleavage involving the breakage of the glycosidic 

bond. 

Cross-ting cleavage - A cleavage involving the breaking two of the carbon- 
carbon bonds in one of the carbon rings of a saccharide. 

Speqjfll cleavage - A cleavage which is diagnostically significant, but does not 
directly fell into the glycosidic or cross-zing categories. 

" - Single cleavage — a cleavage event that involves only a single glycosidic, cross- 
ring or special cleavage event, ie 1 -cleavage. 

Multiple cleavage — A cleavage event that involves more than one cleavage 
event Can be described as n-cleavage events, ie. 2 -cleavage, 3-eleavage etc. Fig. 2 is 
an example of a 2-cleavage event 

Fffftmqit - A result of a single or multiple cleavage event. In Fig. 2 the 
fragments are 21, 22 and 23. 

Disjoint fragments - are fragments which do not have any common 
saccharides. 
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Reducing «nrf fragment - A fragment which contains the reducing end of the 
structure. Fragment 21 in Fig. 2. 

Non-red^fcfofl fragment — A fragment which does not contain the reducing 
end of the structure. Both fragments 22 and 23 in Fig. 2. 

gleavpge type - Carbohydrate fragmentation patterns are discussed in the article 
H A Systematic Nomenclature for Carbohydrate Fragmentations in FAB-MS/MS 
Spectra of Glycocotyugates" by Bruno Domon and Catherine E CosteJlo published in 
Glycoconjugate J (1988) 5: 357-409, the entire contents of which are incorporated 



35 



herein by reference. "Domon and Costello" notation is the accepted norm for la bell ing 
glycan fragment ions and is used herein. 

Reducing end fragments may only be the result of particular types of cleavages. 
For 1 -cleavages, these are the Y, Z, X and certain special cleavage types. For n- 
cleavages, reducing end fragments only occur where there are no B,C or A cleavages 
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ne set oi cleavages that occur. For example, reducing end fragments include 
Y , 2 and Y/Z (Y and Z simultaneously) fragments. A B/Y fragment cannot be a 
reducing ad fragment, 

Non reducing end fragments can result from combinations of cleavage types that 
only Include a single non reducing cleavage type. It is not possible to create a fragment 
from more than one non reducing cleavage type. 

Eea^-Apeak in an MS/MS spectrum. This peak has amass to charge (tu/z) and 
relative intensity (relative to flic largest peak in the spectrum). 

Glycans may have numerous branch sites, indicated at 5 in Fig. 1, on each 
monosaccharide, as well as isomers and anomers. This results in complex 
fragmentation spectra in which the fragments observed may result from the different 
types of cleavage, cleavage in different locations and multiple cleavage. 

1 - cleavage fragments generally tend to hold more sequence information than 2- 
cleavages. For a 1 -cleavage event, the oligosaccharide is split into two parts, one 
containing the reducing end! mdthe other containing the hon-ieducing end section. It is 
possible to conclusively infer the composition of a complementary t-cleavage fragment 
from the composition of a 1-cleavage fragment, since the composition of the fell 
oligosaccharide is known. Reducing end 1-cleavage fragments are especially important 
fiw sequencing as fee composition of the fragment containing fee reducing end is 
unambiguously determined. Also, since fee reducing end fragment composition is 
fcaown, fee composition of fee non-reducing end fragment can be inferred from fee 
difference in composition between the reducing end fragment and the fell 
oligosaccharide. 

2- eleavage events generally result in three possible fragments being created. The 
composition of only one of these fragments is ever fully characterised. Furthermore, the 
position of fee reducing end monosaccharide is only disambiguated for 2-cleavage 
events when fee fragment is a reducing end fragment. For reducing end 2-cleavage 
results, fee composition of each of fee two "lost" fragments cannot be unambiguously 
determined. Similarly, for a multiple reducing and non-reducing 2-cleavage, fee 
compositions of fee two complementary fragments from fee main fragment cannot be 
unambiguously determined. Since only fee composition of fee parts of fee 
oligosaccharide seen ht a fragment can be accurately determined from 2-cleavage 
events, there is a greater degree of uncertainty about fee arrangements of the 
monosaccharides in fee complementary tragi 
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Any discussion of documents, acts, materials, devices, articles or the like which 
has heen included in the present specification is solely for the purpose of providing a 
context for the present invention. It is not to be taken as an admission that any or all of 
these matters form part of the prior art base or were common general knowledge in the 
S field relevant to the present invention as it existed in Australia before the priority date 
of each claim of this application. 
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Summary of the Invention 

la a first broad aspect of the present invention, there is provided a method for 
characterising the structure or sub-structure of a glycan or glycan derivatives, 
comprising the steps of: 

Experimentally deriving the mass of an unidentified glycan molecule. 
Comparing the mass of the unidentified glycan molecule with identified and 
characterised glycan structures to select candidate structures for the glycan molecule. 
Experimentally deriving the mass of fragments of the glycan molecule. 
Theoretical fragmentation of the selected candidates. 

Matching the mass of the fragments of the unidentified glycan molecule with the 
mass of fragments theoretically derived from the candidate structures. 

Scoring to produce ranked confidence scores for each of the candidate structures 
by comparing the masses of the experimentally derived fragments with the masses of 
the theoretically derived tragi 



Such a system is able to provide high throughput characterisation of glycans by 
mass spectrometry, and automatic comprehensive and rapid identification of glycan 
smictures. while at the same time it supports a non-biased interpre 
spectra, based on the interpreters knowledge. 



stall 



If insufficient confidence is obtained in the highest ranked score, the process can 
be repeated by taking into account more complex cleavage patterns in the theoretical 
30 fragmentation step, or by obtaining further spectra.. 

The initial data set used for comparison with the experimentally determined 
mass may consist of only fragments mat are the result of 1 -cleavage fragmentation. It 
may also include 2-oleavage events which are formed exclusively from glycosidio 
cleavage types. The glycosidic cleavage pattern is the parameter that contains 
36 information about oligosaccharide sequence. This limited set of fragments provides 
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enough data for the primary sequence scoring method to work. The increase in data set 
size by adding more fragments is limited by refining the data set when required. This 
way, by restricting the types of fragments generated based upon the results of the 
scoring, it is possible to keep the data set size to a manageable size. 



*■[•)•» 



cry is peak intensity. This is purely 
ric configurations may undergo 
Fragment intensities will of course 



hi order to characterise oligosaccharides mere are two criteria that need to be 
fulfilled; flie sequence has to be identified and the linkage configuration and position 
has to be determined. Information about either of the two win provide valuable data. In 
a broad sense mass spectrometry will be able to predict the sequence information while 
Hnkage information wilt be more difficult to obtain with that technique. Cross ring 
cleavages or specific cleavages will have the potential to enable linkage position to be 
determined, while the hnkage anomary is the parameter that is most difficult to obtain. 
On a computational basis, sequencing will be able to generate in silico glycosidic and 
cross ring fragments solely on a mathematical basis, but information about linkage 
anomery can not be included. The characteristic in a fragmentation spectra that has the 
potential of including some information aboul 
since it could be envisaged that different 
fra g m en tation rearrangements by different 
also depend on other parameters. 

Scoring methods will be designed to do the following: 

1. Provide a quality scoring based on the sequence allowing judgement of 
whether the sequence at least is correct. 

2. Provide a ranking between oligosaccharides based on sequence (glycosidic 
cleavages) 

3. Provide a ranking between oligosaccharides based or ** ' 
(cross ting cleavages) 

4. Provide a ranking between oligosaccharides based on ot 
Including generic n-deavages and other special cleavage types 
cleavage is a cleavage that produces a fragment that is specific to m 
may include the loss of water, for example. 



in a rurtner aspect a scoring method is provided involving si w _ 
that counts the number of possible conformations for an oligosaccharide identified by I 
36 set of matching oligosaccharide fiagments. By detemrinmg how well a particular 
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conformation is supported by the evidenced fragments, it is possible to gauge the 
quality of match for the particular structure. The score for ordered segments (ibat is 
where the segment it connects to is known) arising from 1- cleavage fragmentation is 
calculated to be the number of iurangeroents for each segment multiplied by the 
5 minimum number of points that each segment can attach to in its next segment 

The score can be calculated as the number of arrangements of monosaccharides 
in the segment, and since the next segment that an ordered segment can attach to is 
known, it is possible to know how many points that an ordered segment will attach to. 

Additional information from 2- cleavages that span the boundary between the 
10 two segments can reduce the possible number of positions that the segment can be 
connected to by essentially anchoring the 2- cleavage segment 

* 

Further adjustment may take account of uneven sub-segment size and multiple 
independent cleavage events. 

The fragment generation process will preferably omit redundant fragments and, 
15 when known, chemically impossible Segmentation to reduce the amount of&agmmte 
and data to be processed to make the method more efficient 

TTie identification of glycan differences offers indicators for recognition of 
glyoosylation differences which for example can occur on proteins, lipids or 
proteoglycans- These variants have been linked to disease, cell differentiation, cell 
20 communications, immunological recognition and other significant characteristics, 

* 

Brief Description of die Drawings 

Specific embodiments of the present invention will now be described, by way of 
example only, and with reference to the accompanying drawings in which: 
25 Figure 1 illustrates the depth of edges in a glycan structure S where: depth (E% ) 

< depth (E2)< depth (E 3 ) = depth (E4) 

Fignre 2 illustrates disjoint fragments where edges E 3 and E4 have been cut 
Figure 3 illustrates a non-disjoint double non-reducing end fragment; 

Figures 4 schematically illustrates the method of glycan mass fingerprinting of 
30 the present invention; 

Figure 5a is a graph showing a spectrum of peak masses of an experimentally 
fragmented oligosaccharide illustrating the fragments assigned to peaks in the 
spectrum; 

Figure 6 shows an oligosaccharide structure of Example 1; 
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Figaro 7 is a graph showing we spectra of the oligosaccharide structure of 
Figure 6; 

Figures 8a to 8c are parts of a table giving the score, missed intensities and 
grouping score for a number of oligosaccharide structures which potentially match the 
.5 oligosaccharide structure of Figure 6; 

Figure 9 shows an oligosaccharide structure of Example 2; 

Figure 10 is a graph showing the spectra of the oligosaccharide structure of 
9; 



figure 1 1 shows a table giving the score, missed intensities and grouping score 
10 for a number of oligosaccharide structures which potentially match the oligosaccharide 
structure of Figure 9; 

Figure 12a, b and c are a series of diagrams showing 1- cleavage fragmentation 
occurring at a glycosddic bond; 

^te?* 8 l?a and b are a series of diagrams showing how a reducing end 2- 

15 cleavage event segments an oligosaccharide into three sesmenfe: 



Figure 14 is a diagram showing how a non-reducing 2- cleavage event creates 
three segments; and 

Figure 15 is a series of diagrams ulustrating how a non-reducing 2- cleavage 
event creates three segments that can be arranged together in nine possible ways. 

20 

Detailed Description of Preferred Embodiments 

The characterising process is called Olycan Mass Fingerprinting or GMF, and is 
outlined in Fig. 4: 

Experiments derive the mass of an unidentified glycan molecule 40. 

25 Theoretical work begins with a database of identified and characterised glycan 

structures 41. 

Prellaninai y matching involves comparing the mass of the unidentified glycan 
molecule with the identified and characterised glycan structures to select candidate 
structures for the glycan molecule. 

30 Further experiments derive the mass of fragments of the moleculo 42. 

Theoretical fragmentation is then performed using the selected candidates 43. 
Matobing 44 involves comparing the mass of the fragments of the unidentified 
glycan molecule with the mass of fragments theoretically derived from the candidate 



structures. 
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Scoring 45 produces ranked confidence scares for each of die candidate 
structures by comparing the masses of the experimentally derived fragments with the 
masses of the theoretically derived fragments, A number of different scoring regimes 
are available. 

If insufficient confidence is obtained in the highest ranked score, the process can 
be repeated 46 by taking into account more complex cleavage patterns in the theoretical 
fragmentation step 43, or by obtaining further spectra. 



Derive the mass of a pfr yan molecule 40. 

10 A user will supply the mass of an unidentified glycan 

of mass spectroscopy. 
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Database g£jggmHjjgd and 

~ - - Ar number of suitable databases are available. For example, GlycoSuiteDB 
available at "www.glycosuite.com 11 provides a database of identified and characterised 
glycan structures as does the database "Glycomind$ n . The database can be in simple 
table form, or can be in a relational form to exploit other information that may be 
associated with glycan structures such as biological source information. 

Preliminary matching involves comparing the mass of the unidentified glycan 
molecule with the identified and characterised glycan structures to select candidate 
structures for the glycan molecule. 
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Experiments derive the magg g 



teculc 42. 



Individual oligosaccharides could be submitted to GMF after mass spectrometry 
under conditions producing fragment tons for example by tandem mass spectrometry, 
or in source fragmentation, or alternatively oHgosaccharide mixtures could be separated 
into individual components with separating methods hyphenated with mass 
spectrometry. This includes techniques such as hple and capillary electrophoresis. 
Various ionlsatian methods and conditions could be used. Multiple stages of mass 
spectrometry could also be used, where further fragmentation of fragment ions is 
required. 



Theoretical fta 



candidates 4g. 



In the present invention, a database of the theoretical peaks masses for all 
le glycan fragments along with their unfiagmented molecular parent mass, is 
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produced by collating the set of theoretical fragments for an entire database of 
identified and characterised glycan structures. 

In a refinement of the invention in order to match against and identify novel 
glycan structures, which are not already disclosed in existing databases, it is equally 
B feasible to construct a theoretical database of all possible fragmentations of the much 
larger set of theoretically possible glycans. It is envisaged that Ibis much larger 
database will be used for a second path search in which a glycan's fragment masses do 
not satisfactorily match to any known glycan fragment fingerprint. 

In order to obtain theoretical peak masses for a Glycan structure, an algorithm is 
10 needed to generate sets of fragments for the foil sets of n-cleavages for a structure. The 
method used for generating fragments is based on a oombmatodal/pennutation method. 
The method can be broken into two stages namely edge selection and cleavage 
assignment 

- * •• - - • . ^ ^. . 

15 Edge Selection 

A structure S is composed of m monosaccharides with m-1 glycosidio bonds 
existing between monosaccharides. In order to generate a foil set of fragments for n- 
cleavages, we need to consider foe breakage of bonds at n positions (where n 
There exists C 0-1 * combinations of glycosidio cleavage points (edges) for a n-cleavage 
20 fragmentation. In order to minimise size complexity an iterative method is used to 
generate all combinations of edges, E is foe k-subset of the edges found in S. keen be 
any number up to (m-1). 

For trample foe 2-subset is a set of all combinations of edges where two edges 
are combined. For the example shown in Fig. 1 there are four edges E i , E 2 » E 3 , and 
25 E 4 . For a double cleavage, k=2 and the k subset comprises all possible combinations 

of Ei ,E2,E 3 ,andE4 8 twoatatimenamdy(Ei > E2)i(Ea f B 3 ),(Ei,E4),(B2» 
E3) f (B 2 ,E4),and(E3,E 4 ). 

The edges within each k-subset are then sorted according to depth, which 
produces an edge vector. Edges that involve monosaccharides closer to the reducing 

30 end, are sorted with a higher rank than edges occurring at a greater depth. For example, 
with reference to Fig. 1 illustrates the depth of edges in a glycan structure S where: 
depth (Ei )< depth (E* )< depth (E 3 ) - depth (E4). Edges E, and Ea are selected from 
that Figure bong foe best two edges. The k-subset of edges is (t^Ei) and once sorted, 
foe edge vector will bo (EuE^) since E 1 is closer to foe reducing end of foe structure 

35 which is conventionally drawn on foe right of foe structure and is foe end in which foe 
hydroxide on C-l is not extended with additional monosaccharide units. The ordering 
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of edges is crucial to ensuring the accurate generation of fragments, as it is possible to 
choose particular cleavages to assign to the edges so that a disjoint fragment is 
generated. Thus with reference to* Figure 2 if edges E 3 and £4 are cut, two separate 
fragments are created. 

Carbohydrate fragmentation patterns are discussed in the article "A Systematic 
Nomenclature for Carbohydrate Fragmentations in FAB-MS/MS Spectra of 
Glycoconjugates" by Bruno Damon and Catherine E Costello published in 
Glycoconjugate J (1988) 5: 397-409, the entire contents of which are incorporated 
herein by reference. "Demon and Costeilo" notation is foe accepted norm for labelling 
glycan fragment ions and is used herein. 

Reducing end fragments may only be foe result of particular types of cleavages. 
For 1-cleavages, these are foe Y, Z, X and certain special cleavage types. For n- 
oleavages, reducing end fragments only occur where there are no B,C or A cleavages 
amongst foe set of cleavages that occur. For example, reducing end fragments include 
V ", Z '&MTY/Z (V and Z simultaneously) fragments. A BfY fragment cannot- be- a 
reducing end fragment 

Non reducing end fragments can result from combinations of cleavage types that 
only include a single non reducing cleavage type. It is not possible to create a fragment 
from more than one non reducing cleavage type. 

Calculating all foe possible fragments is computationally intensive. Where two 
or more cleavages occur some of those 2-cleavages will produce fragments that will 
already have been accounted for in the 1-cleavages. For example foe reducing end 
fragment produced when foe two edges B, and E4 are cut is also produced as a result of 
a 1 -cleavage at Bj. The results of foe E4 cleavage ate not used as E4 did not reside on 
the reducing end fragment that was a result of foe Ej cleavage. Any fragments produced 
by a 2-cleavage, that are also produced by a 1 -cleavage do not need to be calculated. 
Generally, when generating all n-cleavages, any fragments that could be produced by a 
m- cleavage ( where m < n ) axe discarded. 

For each combination of edges obtained in foe edge selection step a fragment 
can be generated by applying a set of fragment types to it Referring now to Figure 3 
which shows a non-disjoint double non-reducing end fragment consider a combination 
of edges formed from a 2-cleavage event consisting of Edge A and Edge B. At Edge A, 
foe possible cleavage types that could have occurred are all reducing and non-reducing 
end cleavage. At Edge B, only reducing end fragments could have occurred. Only 
reducing end cleavages occur at Edge B as it is not possible to have two 



■ f m 



COMS to No: SMBHJ0476B38 Received by IP Australia: Time (Hm) 16:18 Date 



(Y-M-d) 2003-10-31 



12 



end cleavage types resulting in a non-disjoint fragment A fragment of this type would 
in feet be identical to a single cleavage o counting at the edge B with the greatest depth. 
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Cleavage Assigi 

To assign cleavages to fragments, we map the selection of cleavage types onto 
each element of E. 

T -= the Set of n element cleavage type permutations. 

forte T,lfHn 

V e € E: V t € T; Fragment - (t,e) ie -(fragment type, position) 



of cleavage types does not 



T is restricted so that each n-element p< 
a more than one non-reducing end fragment, 
occurring, the structure is checked to ensure that the structure can support the fragment 



IM*Ml±4H«}lll 



fragments where for a reducing 



Basic checking occurs to invalidate any redi 

type assigned to a cleavage pointy a. traversal to, the reducing. 
i not traverse airy other cleavage points. Non-reducing end fra, 
valid if for any of the reduoing-end cleave points a travec 
does not pass a B cleavage point Checking occurs by star 
t occurring at the least depth (closest to the reducing end), tw 
mte the reducing end, and marking any monosaccharide that i 
repeated 



it 



which causes the loss of branches containing marked monosaccharides due to an A 
cleavage type is discarded. 



'JIM 



types 



fragmentation occurs of the structure. This process involves removing branches 
ftom the virtual representation of the structure so that it will represent foe structure of 

L Once foe virtual fragment has been generated foe mass can be obtained 



the fia_ 
by looking 
of fragmenta 

Demon + Coatello notation and assigned to the fragment 
~~ >f fragment^ is a combinatorially 





■1 


1 hi 





fragments 



UJiiilrl-i 



fragments 



of allowed cleavages 



against 



are stored in a database. Typically the fragments for 1 -cleavages, and 2-cleavages from 
35 exclusively gjycosidic cleavages will initially be used. 
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Matching 44. 

Matching involves comparing the mass of the fragments of the r 

glycan molecule with the mass of fragments theoretically derived from the candidate 



5 structures. 
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Scoring 4fr 

A user will supply a spectrum, which consists of tuples of m/z and intensity 
. Each tuple is called a peak. The peak mass is converted into a true mass by 
for charge state and adduct, and then compared against the set of theoretical 
fragments to find any fragments which have a mass within the tolerance range of the 
peak's true mass. The fragments are then collated according to the parent structure and 
scored. 

. m ° ***** pr0cesS: one to Pennine the sequence 

Q^dtty of match of a candidate structure, and anofoer to rank frW can^ s^ucnnW 
relative to each other. The fimfly of algorithms for each scoring type are defined as 
quality and relative scoring methods respectively. Based on the combination of these 
two scoring methods, it is possible to determine the likelihood of a result structure 
being the one defined by the input spectrum, in regards of sequence or linkage 
20 information or both. 
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Quality Scoring 

The quality score for a result encapsulates how well the fragments matched for a 
sequence define that sequence. For example, a result structure that matches only a 
single small fragment will be a low quality result, whilst a structure which has many 
fragments matched which are distributed over foe entire structure will have a high 
quality score. One such quality scoring algorithm is a grouping algorithm. 

Group scoring derives the cleavage points from the fragment types, and obtains 
a number which represents how well foe structure is characterised by the set of 
fragnrents associated with it. The best fragments used to characterise a structure are 
those resulting from 1-cleavages. If there are m - 1 unique cleavage points found in a 
glycan strudnre.3 associated 1-oleavage fragments for a glycan having m 
monosaccharides, then there is enough evidence in the fragments that the sequence of 
the structure is valid. 
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Fragments resulting from 2-cJeavages do not necessarily indicate the presence of 
a specific cleavage point in a structure. 1 -cleavages are special as the presence of a 
fragment is enough evidence to prove that a fragment occurred at the cleavage point 
2-cleavages can be considered as a fragmentation of a fragmentation. One of the 
cleavage points in a 2-cleavage can be used as evidence if the other cleavage point has 
evidence supporting ifs existence. In other words, the 2-cleavage must have an overlap 
with another 1-cleavage, or 2-cleavages where one of its cleavages have been assigned, 
for it to contain an equal amount of information. For this reason, 2-cleavages are not 
weighted as importantly as 1 -cleavages. Any scoring method mat examines cleavage 
points should be able to encapsulate this information. One possible algorithm involves 
a process of trying to fulfil each cleavage point in the original structure with a matched 
fragment Whenever possible the grouping scoring algorithm will try to use a single 
cleavage fragment to fulfil the cleavage point If the cleavage point cannot be fulfilled 
by a 1-cleavage fragment, it wfll use a 2-cleavage fragment The actual score assigned 
is derived using: 

Equation 1 

Score « (a - 0.25b) / (m - 1) 

where a is the number of cleavage points assigned to 1-cleavage events, and b is the 
number of cleavage points assigned to 2-cleavage fragments. A structure whose 
cleavage points are strongly supported by ifs fragments is assigned a score closer to 1. 
This method can be extended to handle generic n-cleavages where n is greater than 1, 
by extending the formula to appropriately weight the importance of the cleavages and 
further subtracting those from a. 

It should be noted that the above is only one simple type of scoring equation and 
that other equations could be used to perform the same function encapsulating the 
information from both single and double cleavages. 



30 



Segmentation Scoring - a special case of Group Scoring 

Segmentation scoring is a qualitative scoring method that counts tho number of 
possible conformations for an oligosaccharide identified by a set of matching 
oligosaccharide fragments. By detenninmg how well a particular conformation is 
supported by the evidenced fragments, it is possible to gauge the quality of match for 
the particular structure. 
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Simple segmentation 

A 1-oleavage fragmentation that occurs on an oligosaccharide can be considered 
as evidence of particular sequence characteristics of the oligosaccharide. For example, 
an oligosaccharide where a 1 -cleavage fragmentation occurs at a glycosidic 
xuis ^assentation provides two pieces of evidence about the sequence. We can 
consider the fragmentation to have split the oligosaccharide into two parts - S' and S" 
Both S and S" arc segments of the oligosaccharide. A segment of an oligosaccharide is 
itself an oligosaccharide, and is used to help measure the worth of evidence of a 
particular fragmentation. 

S» contains the reducing end of the oligosaccharide somewhere within ha set of 
monosaccharides. AH monosaccharides contained within S" can attach to the reducing 
cud, or form chains of monosaccharides terminating at the reducing end 
monosaccharide. For the monosaccharides contained in S' to be attached to the 
reducing end, mere must be a single child monosaccharide connected from S' to S". A 
monosaccharide in 8" cannot be a tif 



atfon provides three pieces of evidence about sequence for this 



This frag 
oligosaccharide 

all monosaccharides within S' must be connected to at least one monosaccharide 

fflS 1 

all monosaccharides within S» must be connected to at least one monosaccharide 

in S" 

a single monosaccharide in S' fa connected to another monosaccharide in S" 
These three pieces of evidence can be used to construct a set of oligosaccharides which 
can support a fiagment with an identical mass to the found fragment Further evidence, 
«uch as foe composition of the fall monosaccharide is also used to construct these 
oligosaccharides. 

Although it is algorithmically possible to create and sequence all possible 
stmctutes which may support mis fragmentation, ft fa only necessary to count the 
****** structures that win be genemtod. The total number of structures can be 
calculated by enumerating both foe possible arrangements of a segment, and the 
number of ways that a particular segment may be attached to another segment For &\ 
we can calculate foe number of possible arrangements of foe monosaccharides 
«mtamed in 9 using foe formula »~ . Similarly, S" can be arranged in ways. 
To calculate foe number of ways that S' can be attached to S«, we consider foe number 
of positions that foe reducing end monosaccharide from S» can attach to a 
monosaccharide in S". Since S" comprises n monosaccharides, there are n possible 
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attachment positions, la total, there arc * m — x „ possible arrangements of 
monosaccharides. 

Complex segmentation 

Although segmentation is simple & r a single fragment, multiple fragments 
5 significantly complicate the process of segmei 
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2-cleavage frag 



ion 



are two cases for the segmentation resulting from a 2-cleavage event: 
Reducing end 2-cleavage event - When a reducing end 2-cleavage event 
fragment is used as evidence for segmentation, it segments the oligosaccharide into 
three segments. S« and S" can attach to any position in S', since me evidence is only for 
two glycosidio cleavages to have occurred. 

Non-reducing end 2-cleavage event — A non-reducing 2-cleavage event also 
creates three segments. Since no directional information is stored in this fragment, and 
the reducing end may be contained in S" or S"\ S' maybe attached to any of S" or S m 
Smulariy, S™ may be attached to S' or S'» and S™ may be attached to S' or S". There are 
9 possible ways that S', S<" and S» can be arranged together. Let S- contain x 
monosaccharides, S» y monosaccharides, and S'" z monosaccharides. The full structure 
contains m monosaccharides. There are 

TS1* *** &**> x & Xz M**fl Po^We arrangements of oligosaccharides to 
tulru these conditions. 
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Multiple independent 1 -cleavage events 

Multiple 1-cleavage events complicate the segmentation process by introducing 
nested segments. A nested segment is a segment created from a segmentation of an 
existing segment The original segment will change from containing a set of 
monosaccharides, to a set of segments. Segments are created by considering 
fragmentation evidence from the non-reducing terminal monosaccharides and working 
towards the reducing end. As each piece of fragmentation evidence is applied to toe 
segmentation, the segment containing the reducing end is farther segmented. A 

30 ° f t" ^ ° f StrUCtOTCS ^ seated by this refinement of 

30 segm^tafcon, resultmg m a reduction in the number of possible structures that can be 
created with each successive fragment accounted for as evidence. Once all 1 -cleavage 
eveutebave bean accounted for, a set of segments with defined relationships between 
each outer are found. 
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Multiple 2-olcavago events 

A set of segments along with information regarding which segments are 
definitely attached to other segments is created at the end of me previous stage. For any 
segments which contain more than one monosaccharide, further fragments are 
interrogated to find any former evidence of sequence. 2-oleavage reducing end 
cleavages are treated by intersecting the segments created by the 2-cleavage event with 
the existing segments. Other 2-cleavage events can only be relied on for me grouping 
of monosaccharides in the fragment This grouping of monosaccharides is also treated 
as a segment, and intersected with the existing segments. Once all fragments have been 
used to create segments, the oligosaccharide is maximally segmented, i.e. all groupings 
of monosaccharides have been merged to produce the smallest groups of 
monosaccharides possible. 

Calculating the score 



The score is-calculated by calculating the s core for the ordered segments; which 
in turn calculates me score for unordered segments. The entered segments are segments 
where foe segment that it connects to is known. Ordered segments arise from 1- 
deavage fragmentation. To calculate foe score for Ordered segments, foe number of 
arrangements for each segment is calculated, which is then nroltiplfc 
number of points that each segment can attach to in its next segment 



Score 



•rtJi^^^e&nenO ^minimum number of positions that segment 



attach at 
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The score of each segment is calculated in one of two ways. If foe segment has 
W sub^segroented by 2-cteavage fragmentation, a method detailed later is used. If no 
former sub-segmentanon has occurred, the score is calculated as foe number of 
««ngements of monosaccharides in foe segment Since the next segment that an 
ordered segment can attach to is known, it is possible to know how many points that an 
ordered segment will attach to. With no additional information, foe segment can attach 
toas many monosaccharides as there are in foe next segment Additional information 
fiotn 2-clcavages mat *pa» the boundary between the two segments can rcdncc foe 
PO^obte number of positions that foe segment can be connected to by essentially 

enu^n itrr SegmCQL ** « «*««* identified, the 

«tire oligosaccharide is treated as one big segment and the score is calculated using 2- 
cleavage fragments if possible. 
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<*nari$em8ms(subsf>gments) x II score(s) ^minimum tnttnbar ofatiachmen 



core(se$mettf) 



number of wickering segments 

The number of arrangements of sub segments is given by the above formula. A 
further adjustment of the number of structures created has to be performed due to 
uneven sub-segment size. For sub-segments that are larger than a single 
monosaccharide big, the number of arrangements is increased based upon the number 
of sibling sub-segments. The number of attachment points is given by finding the 
smallest sub-segments of a sub-segment that the sub-segment has in common with its 
sibling segments. The number of anchoring segments is the number of sub-segments 
that the segment has grouped together with the segments from a 

Contrasted Intensities 




In order to further discriminate between matches using the segmentation scoring 
method, the contrasted intensity score is used. The contrastod-intensity score is applied 
m two stages. The first stage looks at the total intensity matched to glycoside cleavage 
fragments for a match in comparison to the total intensity matched to glycoside 
cleavages for foe other candidate structures. The second stage compares total intensity 
matched to cross-ring cleavages. 



20 Example 1 

Consider the hexasaccharide shown in Fig. 12, where the structure mass, shown 
in a) and four fragment masses (b-e) have been found. Initially, no rules are known 
about foe arrangement of monosaccharides in foe structure 

2B „ Witb 00 frasmwte. foe structure is segmented into a single segment containing 
all monosaccharides. 

Rules 

(1) S={A.B ,C.D ,B ,F } 

(2) M =S 

■Arrangements 

—arrangements (M ) 
=6 5 =7776 

where arrangements is a function calculating foe arrangements of a segment 
The attach function (seen later) is the number of points mat a segment can attach to 
another segment with. 
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This is a purely mathematical number of arrangements of the monosaccharides 
in the structure, and does not accurately reflect reality, where the number of 
arrangements is limited by the number of possible attachment points for each 
monosaccharide. 



10 



b) 

A fragment resulting in the loss of monosaccharides JB ~ F or alternatively the 
loss of A only is found. Fragments resulting in the loss of B-F and fragments resulting 
in the loss of A are complementary. The segments Mi and Ma are created from the 
intersection of the new segments (Si and S2) and the existing segment (S). 

Rules 

(3) S a =[B ,C.D ,E ,F } 

(4) Bx-[A\ 

(5) S 1 ^S i 



(7) M ^SnS* 



15 



20 



Arrangements 

"arrangements (AT JXattach (Af 2 -»M ^arrangements {M .) 
=rxiXl-625 



o) 

Rules 

(8) S^[A,B ,F } 

(9) S A ~[C.D ,E } 

(10) 5 4 -+$, 

(11) M 3 ^s 2 ns 3 

(12) M 4<*S z ftS 4 
Arrangements 

^arrangements (M Jxattach (M 4 -»Af 3 )x arrangements (Af ,) 
Xattadt (AT 9 -»Af i)X arrangements (Af ,) 
=3 2 X2X2XlXl=36 
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egment 



variability within M4, but the rest of the structure has all of its sequence fully 
supported. 

5 Rules 

(13) S S -(A,B ,C.D ,E } 

(14) $<=[F } 

(15) S 4 -S s 

Arrangements 

X arrangements (B )X attach (B -> M ,)X arrangements [M ,) 
=3*XlXlxiXIXlXl-9 



The 2-oleavage event only contains information about foe grouping of 
monosaccharides, and does not contain fofonnation about complementary fragments. 
As such, we can only add a single rule to the set of rules, The number of arrangements 
for is calculated in (1). 

15 Rules 

(16) S 7 ={C,D ) 

Arrangements 

20 J0 

M4 is spKt into two segments M san da segment containing only C. There are 
two arrangements of this, segment^ s . M , ^Z y ^toC^ 

aomtZ rJ"* ° °- atta * 10 1,1 m ° n P0Siti0m - T ° for tins, an 
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Example 2 
a) 

Rtfening to Fig. 1 3 two single fragments have been 
split into three segments: St. S* and a . 

number of arrangenients possible is i 08 . (A,B,C) ' ^ totel 

A 2-cleavage event is used in seement ^ u 

io from this cleavaae soans segment the structure from a). A resulting segment 

, p _jT t ^ paa8 ^ 1-sWge segments, bi this ease, the 1-cleavaee 

segments are sub-segmented. Let the semn«,t*t,o* a- * , leavage 
be s . ^er me segment that this 2-cleavage would have created 



.C.D ) 

' -Si-Jin*'-**.* .c}n{* .c.d ;C } 



Also, we know that S 3 must attach to S 4 Became rt ffl, )o e ~ , 
SaviaBandC If S, attached t«« • a * 4 ' of this, Sj can only attach to 

Since S 4 must ^XTJ^Zt^T "* * ^ bC 

can be -om^^ 

supported by mis fragmaniation The n JwJT ° f m0n05accharid ^ 

attach to S, and •J^SET^ ^t^b < D ,™ — * 
Sa was reduced to 6. ^ **** *• number of arrangements of 
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Relative Sc aring 

Relative scoring methods will allow for differential 
same quality score. One method which can be used fa a ™«*Z7~ ™" ^ 
50 method. Matched intensity can also iTmTw JTL * 8Corin g 

Saddle eleavagS^r^!^^ T** -f— 
with o, without concomitant ZJZZZZZ^ "* ***** — 

matche^m^l^?^ * f * Have 

- ho* togemer).7p^tX^ ST ST ^ ** 

Correct will - ^ _ »«w*iuros wmca ai© more 

of spectrum peaks matching with any fragments. 
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The matched i 
structures, which may 









F 




ITT 


r 


r 



ciflarry useful for distinguishing between isomers of 
have an identical grouping score. The matched 

, [uantity of diagnostic fragments that have matched, 

and a difference in score suggests a difference in matched fragments. For ease of 
i^rtrng, the matched intensity score can be converted into a missed intensity score 
which is simply the sum of total intensities in the spectrum minus the matched intensity' 
score. ' 
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Iteration 4rf, 

Based on the scoring, the data set size is former increased by adding more 
fragment types, and the process is perfobned again. The process can only be repeated 
unffl the expenmental data set is exhausted of the required information, ie no unique 
fragments can be found that distinguish particular oligosaccharide candidates. In order 
to improve efficiency, only a portion of the spectrum may need to be used, or the 
process may only be performed against fragments which are the result of certain 
stroctores being fragmented. A structure which has at least one fragment which 
ma*hes with a peak true mass win have a set of fragments associated with it. This 
fragment set rsfoe set of fragment* derived from foe structure which have matched 
with the spectrum peak true masses. 

i a^TJl 2™ wd for GMF of only *~ *° °f 

l-cleavage fragmentation as well as 2-cleavages which are formed exclusively from 
glycosidie cleavage types. This limited set of fragments provides enough date for foe 
P^sequeneescoring method to work. The increase in data set size by adding more 
tm^ente is Imnted by refining the data set when required. This way. by restricting the 
^es of fragments generated based upon the results of the scoring, ft is possfoletoLep 
me data set size to a manageable size. ^ 

- - J*"*** the accuracy of foe GMF process without sacrificing speed, 

T^l a f tu ° f ^ Stro °f 68 retUmed be valid candidate structures for the 
Zte^rl ^ ° 0t 6 ** ^ * *dar to exploit this, a more 

^l ^LTL PMf0mied . aSaiDSt *■ — ~* out of foe current 
m»tt set Extra fragments can be retrieved either from a slower secondary storage 

entiT^r^ " ^ % fM dCteiled » * not necessary for foe 

entire GMF solution space for fragments to be available in every GMF^uery. By 

talong advantage of properties of the sugar structure fragment patte^itil 
possible to target the data set for each GMP m _ ♦ ^ZmJ^ZT 
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As the data sets become more refined, and the possible solution set mote 
roW*, the matched intensity score will increase. Initial data sots will contain generic 
Augments, and Trill not match mote exotic fragments which may occur. However, these 
exotic fragments may not necessarily be useful in determining the correct result out of a 
large result set For example, the intensity of the peak matching the fragment may be 
very low. or the fragment occurs in many of the structures. As the result set is reduced 

2 t tT w^T/* ° fthese *— »d they play a very important role 

in the selection of the most probable candidate structure. 



Results 

Figure 5 shows a graph of peaks from fragmentation of a glycan structure 10 
j/z 689.9 has been matched with two different fragments having the same mass, 
turn is required to determine whether both the fragments that have 
igle one is me correct fragment. 



IflrlWS 



- » - • . a t 



Example 1 

cmninlr^l t ° ^TT * ^ ° li *<»""*>™*> Rehire which is 

cmpincaUy fragmented J5 shown in Figure 6. Figure 7 shows its m/z spectra. Figures 
8a to 8c show a table of results ilmstrating how the method can distinguish between 
two isoforms of structure when the grouping Score is the same by comparing the sum cf 
the massed intensities with the first structure being me correct structure and having a 

oTH Ll!TJt^^ tie3 dCSpite both st™*™* ***** the same Z*of 
u.» as determined by equation 1. 
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Bgarnple t 

T * <OTCrt " s * »- ** ■ »«* ITS 

lowest number of missed intensifies. - • 



" ^ bea ^<*** bv skilled in the art that numerous variations 

I/or modifications may be made to the W™*™, , f . _ 
. ^ . . , 10 me invention as shown in the specific 

bodiments without departing from the sotwt a* ~* , / 

— *e me spun or scope of the invention as broadly 
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described. The present embodiments are, therefore, to be considered in all respects as 
illustrative and not restrictive. 



Dated tbis thirty-first day of October 2003 



Proteome Systems Intellectual Property Fty 
Ltd 

Patent Attorneys for the Applicant 
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